Debugging “multi-issue bugs”

When the same bug is caused by multiple issues and only fixing all of them at once will make it go away
profile photo
Adam Pietrasiak
I spent the last 3 days (2 alone and 1 together with Maciej from our team. 🙏) debugging some screen.studio issue
I cut some corners while doing it, making it way longer than it should be. As the kind of bug itself was quite interesting and it reminded me of some lessons to follow, I thought it might be a good opportunity to write about it.

What is a “multi-issue bug”?

It is probably not the official name for it, but in general:
By “multi-issue bug,” I mean a situation when one bug is caused by more than one unrelated thing at once. It means you must solve all those issues simultaneously for the bug to disappear. If you only solve one of the reasons but not the other, the bug will still be there.
An example could be that your internet is not working because your router is malfunctioning, and the cable from your router to your internet provider is damaged at the same time.

What is tricky about “multi-issue bug”?

Following the example above - you will wonder why your internet is broken. You’ll try to fix the router first. It doesn’t help. But you don’t give up so quickly. You try to borrow a router from your friend to see if it helps. It doesn’t help (you actually fixed the broken router problem, but the cable to the provider is still damaged), so you think, “ok, my friend’s router is brand new, so it is not about the router; it must be something else.” (This assumption will stick and cost you a lot of time and pulled hairs!)
Then you check the cable. You have some backup cable in your garage, and attach your router to your provider with it. It still doesn’t work (you fixed the cable problem, but you don’t use your friend’s router anymore, so the router issue is now preventing the internet from working).
As you can probably see, narrowing down a “multi-issue bug” is way harder than the regular bug caused by a single issue.

What happened in our case?

In Screen Studio, you can cut your video to skip some parts of your recording or speed up some parts.
The issue was that the more cuts you have, the more your microphone audio gets desynchronized with the video with time.
Image without caption
It was not noticeable with short recordings with only a few cuts or if you made no cuts.

Spoiler: what were the issue(s) causing it?

Two things were going wrong at the same time, and both of those were causing desynchronization between audio the video.

The first issue

FFMPEG (video/audio encoding lib we use) has an audio clip filter called atempo for modifying the audio speed while maintaining the voice pitch. It turns out it is not “duration accurate.” Even if you set atempo=1, which should technically change nothing, the final duration of the audio clip will slightly change (~5ms per clip, depending on the sample rate).

The second issue

We did incorrectly offset the “clip range” by one frame when calculating the part of the original audio file to “pick.”
It was not really a “huge” issue if all of your clips had the same speed (then the entire video would have constant video/audio offset of 1 frame [~16ms] - as it was constant, it was not piling up with time leading to bigger desync)
But the issue was getting nasty if your neighbor clips had various speeds.
Let’s say your video changes from speed 200% to 100% on the 60th second of it. We did, by accident, pick 60 seconds + 1 frame at 200% speed.
As it is 200% speed, each 1 played frame in the final video actually plays 2 frames of the underlying original video and audio (that’s why it is at 200% speed). Thus picking 1 extra “played frame” meant we actually picked 2 extra frames of the original material. This means it was adding ~16ms of desync on each such cut (assuming 60FPS export and speed change from 200% to 100%, it would be different on change from 150% to 100% or if exporting at different FPS, etc.)

The noise

At first, I was trying to adjust the settings, frames picking algorithm in 10s of different ways. It led to various results, but the video was still desyncing with time.
The tricky part about it is that I’m pretty sure I did fix the second issue while doing it at least a few times in different ways, but the first issue remained. It did trick me into thinking that the issue lies somewhere else and kinda ignoring it when trying other things or considering, “What the heck is it, then?”.

Ignorance and costly laziness

Image without caption
Maciej did record a special video for me where he ran a metronome and recorded it with the webcam simultaneously. I was a bit skeptical as I thought I had made a silly mistake that would not require precise analysis frame by frame. Spoiler alert: we needed to analyze it frame by frame and measure it precisely for hours while we started to realize the issue.

The process

I will now describe all milestones and learnings you might consider worth remembering.
Note that the part below is from the 3rd day of my debugging when Maciej joined me, and we started to be way more methodical.

Reduce the noise and create simple test data.

The first milestone was when we created the test project, which was 1 minute long and had 1 cut precisely every second. The cuts were (supposed to be) doing nothing as they were not changing the speed of the clips nor trimming away parts of it:
Image without caption
It turned out the video was still desyncing.
We did import it into Final Cut Pro to measure it precisely.
Image without caption
On each second, the “tick” sound from the metronome happened earlier and earlier than in the video. But at least there was some regularity here, and we could measure it (~4.5ms of desync per cut).
While knowing this “offset” was not helpful (it had nothing to do with export FPS or audio sample rate), we had a relatively simple reproduction. As I write it, I have no idea how I did not go that way earlier.
Here we went with quite a common debugging method: “Remove every feature of your app until the issue disappears.”
Image without caption
It turns out it was this atempo=1.0. It was really looking harmless (why would changing the audio tempo to 1.0 be harmful - in the end, changing tempo to 1.0 means “don’t change it”)
Then we found this mysterious comment on the FFMPEG forum from 10 years ago (https://trac.ffmpeg.org/ticket/2484):
Image without caption
Fix: Assuming atempo is messing with the final clip duration; we did a bit of a hack:
  • take slightly more audio than we need and apply atempo to it
  • then take the modified result and trim it again to the exact duration we need, but this time not using atempo so the duration is not altered.
Applying the fix did make it work for this simple video with cuts every 1 second and no speed changes. We had hoped that the issue was gone, but it turned out it was not when we loaded the real project which had the issue.
This was when we realized there was more than one bug resulting in the same issue. And the first facepalm you actually enjoy while debugging.
The second part of the issue consisted of actually 2 more bugs (again causing the same issue)
The first was that the “cuts” position on timeline has very high precision. It is common for it to have a “start time” like 0.52342592s. At the same time, when exporting the video - video frames occur at very precise intervals - each happens precisely 16.6666ms after another (if exporting 60fps). Thus it will almost always happen that the “clips” (and speed) change happens actually somewhere in between 2 video frames. And this has a nasty impact if clips change their speeds. If speed changes between frame 10 and 11, but we actually apply it as if it hanged at “frame 10,75” - 0,75 of frame time audio will be played at a different speed than the remaining 0,25 of the frame. Fix: rounding up audio cut time to frame times according to FPS we’re exporting at
Another sub-issue was a bit brain-boiling. Video frames are “discrete,” while audio is “continuous” (almost). Thus if you have frames 10, 11, 12, and 13, and the speed change happens at frame 11, you need to decide if you mean by this that audio changes its speed at the beginning of frame 11 or the end of it.
It turned out video exporter assumed it started to pick video frames faster at the beginning of frame 11 (remember that at high speed, every “output” video frame skips some “source frames”). In contrast, the audio exporter did switch at the end of frame 11.
Finally, fixing all of those issues made it work like a harm.

Summary

  • If debugging issues take a while, ask, “What if several bugs are involved at once.”
  • If it takes too long, invest extra time into creating a solid debug env with simple data that fits into your brain RAM, clear logs, and dev tools if relevant.
  • Ask someone to help you (even if you feel it will take longer to explain the issue than you hope it will take you to fix it finally)
  • Take breaks and sleep over it
  • If your issue happens in various cases and it looks like it is exactly the same issue - try to assume it might not be the same issue and pick the simplest case to debug as if it is a different bug
  • If you start to be tired and super frustrated, instead of panicking, start to be very methodical and scientific.
Related posts
post image
Don’t increase your Electron app code complexity as you add more windows to it
Due to a simple bug, Screen Studio app did generate over 2 petabytes of network traffic
Never read pixels into JavaScript memory.
Powered by Notaku